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Abstract 

Background: Cockroaches have been recognized as a powerful indoor allergen. Cockroach allergy can be a major factor in 
serious asthma and nasal allergy. Bioinformatics tools have been developed to identify potential allergens. The present 
study was conducted to identify potential allergens in Periplaneta americana (Linnaeus). 

Methods: The study focused on the identification of potential allergens among the characterized proteins of P. americana 
using web-based and publicly available allergen prediction tools that follow the FAO/WHO guidelines for prediction of 
allergenic proteins. P. americana protein sequences were retrieved from UniProtKB. The sequences obtained were analyzed 
using AlgPred. The potential allergens obtained were further analyzed by SDAP for confirmation. 

Results: Protein sequences (233 cases) of P. americana were obtained from UniProtKB out of which 25 were known aller- 
gens. Out of the remaining 208 proteins, 102 potential allergens were predicted by AlgPred. However, only 9 were found to 
be potential allergens after screening with SDAP. Arginine kinases, RNA polymerase II subunit, parcxpwnx02, peptidyl- 
prolyl cis-trans isomerase, hemocyanin subunit type I and type II, homologue of Sarcophaga proteinase and alpha amylase 
were confirmed to be potential allergens by SDAP. 

Conclusion: We have identified nine potential allergens in P. americana that may be used as preliminary support for further 
laboratory experiments. 
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Introduction 

Allergic disorders are among the most common 
chronic diseases in the developed world and an 
increasing problem in developing countries (1). 
Allergy is caused by the exaggerated and harm- 
ful response of the immune system to otherwise 
harmless substances known as allergens. There 
is overwhelming evidence that indoor domestic 
allergens play a key role in allergic disease. The 
primary arthropod allergens associated with aller- 
gic disease are house dust mites and cockroaches 
(2, 3). 

Cockroach allergens are one of the strongest risk 
factors predictive of allergic sensitization and 
asthma morbidity (4, 5). Cockroach extracts in- 
cluding cast skins, egg shells, and fecal material 
(6) have been shown to contain several major 
and minor allergens (7-10). Among the 3,500 
known species of cockroaches, the American 



cockroach {Periplaneta americana) is a fre- 
quently encountered cockroach in homes. Aller- 
gens with masses ranging from 6 to 120 kDa 
from P. americana have been identified by vari- 
ous immunochemical techniques (11) and the 
functional importance of some of these have 
been determined. Two prominent proteins of 78 
and 72 kDa in Per a 3 have been reported to 
cause T cell proliferation in cockroach allergic 
patients (12). More recent data indicate that in- 
door insect allergens, including those of cock- 
roaches, are potent inducers of IL-5 and eotaxin- 
mediated esophageal eosinophilia (13). 
Despite efforts of researchers, our current knowl- 
edge about P. americana allergens and their cross- 
reactivity is still insufficient, at least, partly due 
to the difficulty involved in purifying cockroach 
allergens from extracts in significant quantities for 
detailed characterization. According to Universal 
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Protein Database (UniprotKB) resource, to-date 
a total of 233 proteins have been identified in 
the P. Americana of which only 25 are known 
reported allergens, which suggests that many 
allergens may still lie unidentified. 
The use of bioinformatics tools is becoming in- 
creasingly helpful for initially screening of com- 
pounds based on existing experimentally validated 
databases. Algpred is a web server for prediction 
of allergenic proteins and for mapping IgE epi- 
topes on allergenic proteins with high accuracy 

(14) . SDAP (Structural Database of Allergenic Pro- 
teins) is another web server containing informa- 
tion on allergens and can develop correlations that 
can be used to predict allergenicity of novel pro- 
teins and cross-reactivity between allergens (15). 
The present study is focused on the identification 
of potential allergens using these allergen da- 
tabases and bioinformatics. 

Materials and Methods 

Amino acid sequences of 233 proteins belonging 
to P. americana in UniprotKB database were re- 
trieved. Out of these, 25 were known reported 
allergens. For the remaining 208 proteins, aller- 
genicity test was conducted using AlgPred. The 
FASTA sequences of proteins were given indi- 
vidually to the server, which presented the results 
on the basis of the scanning of IgE epitopes, 
motif-based approach, SVM-based method using 
amino acid composition of protein, hybrid ap- 
proach and BLAST search on allergen represen- 
tative proteins ARPs (14). 

The potential allergens obtained through AlgPred 
screening were further analyzed using SDAP 

(15) . Searches were first carried out according to 



the FAO/WHO criterion for allergenicity predic- 
tion. Proteins that shared more than 35% se- 
quence similarity with an allergen (on a segment 
of 80 residues) or an identity of at least six con- 
tiguous amino acids were screened out. These 
allergens were further analyzed using the property 
distance function (PD) method. A protein that 
passed the initial step and gave a PD score of less 
than 10 was considered to be potential allergen. 

Results 

Following screening of the 208 P. americana 
proteins with AlgPred (excluding very short pep- 
tides which the AlgPred was unable to screen), a 
total of 102 proteins were considered potentially 
allergenic. Further analysis of these 102 proteins 
with any of the two FAO/WHO criterion of SDAP 
indicated that 93 proteins did not fulfill both 
criteria (Table 1). According to the PD scores 
obtained 9 proteins were predicted to be poten- 
tial allergens (Table 2). Alpha-amylase (Fragment), 
parcxpwnx02, homologue of sarcophaga 26, 29 
kDa proteinase, RNA polymerase II largest 
subunit (fragment), and hemocyanin subunit type I 
and II were predicted to be potentially signifi- 
cant allergens as they had a PD score of less than 
10, whereas the arginine kinases and peptidyl- 
prolyl cis-trans isomerase (fragment) passed all the 
screening steps and showed to have PD scores 
of less then 3 which indicated their being highly 
significant potential allergens. 
Of all the predicted allergens, Parcxpwnx02 was 
positive with IgE mapping searches, and was 
found to contain an IgE epitope starting from the 
position 307 of the protein with the sequence 
LANSWNYDWGDNGY. 



Table 1: Proteins showing potential allergenicity with only AlgPred analysis 



No. Protein name Accession Number 

1. Diuretic Hormone P41538 

2. Corazonin PI 1496 

3. Bursicon (Fragments) P84118 

4. Pyrokinin-6 P82693 

5. Peptide Hormone 4 P82697 

6. Peptide Hormone 3 P82696 

7. Peptide Hormone 2 P82695 
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Table 1: Continued... 



o 
5. 


Peptide Hormone 1 


P82694 


9. 


Hemolymph Lipopolysacharide 


P26305 


10. 


Trehalase Inhibitor 


P19986 


11. 


Troponin T 


09XZ71 


iz. 


Sulfakinin 1 


F368S5 


13. 


Periviscerokinin-2 


■ 111 i ffffc 

F81555 


14. 


Periviscerokinin 2.2 


P84422 


15. 


Ubiquitin 


A1E2I6 


16. 


Rab5-GTP binding protein 


B6ZLL4 


17. 


Notch protein (Fragment) 


B6EBG3 


18. 


Delta protein 


B6EBG2 


19. 


Odorant-binding protein 


TIZTTT" AT I A 

B6E9U9 


20. 


Odorant-binding protein (Fragment) 


B6E9U8 


21. 


Odorant-binding protein 


TIZTTT" AT TH 

B6E9U7 


22. 


Cxpwmw03 


02L1Y7 


23. 


CxpwmwO 1 


01PS51 


24. 


MRNA, clone: 1. (Fragment) 


P92056 


25. 


MRNA, clone: 2. (Fragment) 


I til j \ — — 

P92055 


26. 


MRNA, , clone: 3. (Fragment) 


P92054 


27. 


MRNA, , clone: 4. (Fragment) 


P92053 


28. 


MRNA, , clone: 5. (Fragment) 


TiAIAffl 

F92052 


29. 


Lectin-related protein (Fragment) 


P92050 


30. 


Lectin-related protein 


P92049 


31. 


Lectin-related protein (Fragment) 


P92048 


32. 


Lectin-related protein (Fragment) 


P92047 


33. 


26-kDa lectin (Fragment) 


076155 


1! A 

.34. 


Rsp60 


(J76153 


35. 


P10 


O 17447 


36. 


Elongation factor 1 -alpha (Fragment) 


(J02460 


37. 


DNA -directed RNA polymerase (Fragment) 


002459 


o o 

3o. 


Dynamin (Fragment) 






Methionine aminopeptidase (Fragment) 


LIU U 1 A3 


4U. 


Putative uncharacterized protein (Fragment) 


r\AT TT/TA 

DUU 169 


41. 


AMP deaminase (Fragment) 


DUU 123 


A ^ 

42. 


Pyrimidine biosynthesis (Fragment) 


D0USX9 


43. 


Proteasome subunit (Fragment) 


D0USM1 


44. 


r-box protein (Fragment) 


D0US62 


A C 

45. 


Glu-+ pro-tRNA synthetase (Fragment) 


r\ at 1 11 ii7i 


A £L 

46. 


Glycogen synthase (Fragment) 


rvAI T 11 A A 

D0UR99 


47. 


Gelsolin (Fragment) 


D0UR43 


45. 


ATP synthase (Fragment) 




49. 


Acetylglucosaminyl -transferase (Fragment) 




50. 


Clathrin heavy chain (Fragment) 


D0UO69 


51. 


Clathrin heavy chain (Fragment) 


U0UU18 


52. 


Glucosamine phosphate isomerase (Fragment) 


r\AT inii/r 

U0UPW5 


53. 


GTP-binding protein (Fragment) 


D0UPR4 


54. 


Syntaxin (Fragment) 


D0UPE9 


55. 


Spliceosome-associated protein (Fragment) 


D0UPA1 


56. 


Signal recognition particle (Fragment) 


D0UP45 


57. 


Pre-mRNA splicing factor (Fragment) 


D0UNV4 


58. 


Alpha-spectrin (Fragment) 


D0UND9 


59. 


Alpha-spectrin (Fragment) 


D0UNA0 


60. 


Alpha-spectrin (Fragment) 


D0UN65 


61. 


Acetyl-CoA carboxylase (Fragment) 


D0UN29 


62. 


domain binding protein (Fragment) 


D0UMV2 
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Table 1: Countinued. . . 



63. 


ATP synthase (Fragment) 


T\AT T1V /Tff A 

DUUM59 


C A 

64. 


polymerase subunit 2 (Fragment) 


D0ULU7 


65. 


RNA helicase (Fragment) 


D0ULN9 


DO. 


Protein kinase (Fragment) 


T\ AT TT T/l 


01. 


Histone deacetylase (Fragment) 


T\ATTT t\1 


00. 


Gln-tRNA synthetase (Fragment) 


T\ATTT 10 


69. 


Arg methyltransferase (Fragment) 


D0UK69 


IV. 


Regenectin 


S\ ftVAAO 

U"V098 


/ 1. 


NADP-dependent isocitrate dehydrogenase (Fragment) 




11. 


Putative transcription factor 




15. 


10 kDa LEG regeneration protein (Fragment) 


091 WV5 


n a 


Beta-l,4-glucanase 1 (Fragment) 


<j9NCt3 


/5. 


Beta-l,4-glucanase 2 (Fragment) 


U9JNCt2 


10. 


40S ribosomal protein S12 (Fragment) 


UoMlJo 


HH 
1 1 . 


Rab 1 1 (Fragment) 




TO 

;o. 


Vitellogenin receptor 


UolVlrUZ 


ly. 


Large conductance calcium activated potassium channel pSlo spliceform IB (Fragment) 


Uol"V0 


50. 


Large conductance calcium activated potassium channel pSlo spliceform 4C (Fragment) 


AOTOTTi; 


0 1 

51. 


Large conductance calcium activated potassium channel pSlo spliceform 5B (Fragment) 




52. 


Ryanodine receptor pRyR (Fragment) 




53. 


Elongation factor-2 (Fragment) 




84. 


RNA polymerase II largest subunit (Fragment) 


O6.TU05 


85. 


TRPgamma cation channel 


05Y.TT9 


86. 


ParcxpwfxO 1 


05MBV8 


87. 


Parcxpwfx02 


Q5MBV7 


88. 


ParcxpwnxOl 


Q5MBV6 


89. 


Parcxpwnx03 


05G5C3 


90. 


Parcxpwnx04 


05G5C1 


91. 


Adipokinetic hormone preproprotein 


Q5EY02 


92. 


Putative uncharacterized protein (Fragment) 


Q5EY00 


93. 


Rieske Fe-S protein (Fragment) 


05EXZ9 



Table 2: Potential Allergens Predicted by AlgPred and SDAP in Periplaneta Americana 



SDAP Analysis 



No. 


Protein name 


Accession 
Number 




FAO/WHO Criteria 








AlgPred Analysis 


Stretches of 6 
contiguous amino 
acid identical to an 


% identity with an 
allergen over a window 
of 80 a.as 


PD score 










allergen* 






Arginine kinase 




Predicted allergen by 




93.75% with Bomb m 




1. 


[Periplaneta americana] 


D3JUE7 


SVM-based method 
and BLAST approach 


Present 


1.0101 from a.a number 
263 to 342 


<3 


2. 


Alpha-amylase (Fragment) 


D2YVM9 


Predicted allergen by 
SVM-based method 


Present 


61.25% with Bio t 
4.0101 from a.a number 
35 to 114 


< 10 


3. 


Homologue of Sarcophaga 
26,29kDa proteinase 


Q9U914 


Predicted allergen by 
SVM-based method 


Present 


50.00% with Act c 1 from 
a.a number 466 to 545 


> 3 and < 10 


4. 


Peptidyl-prolyl cis-trans 
isomerase (Fragment) 


Q9U8K2 


Predicted allergen by 
SVIVFbased method 
and BLAST approach 


Present 


77.50% with 
Mala s 6 


<3 
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Table 2: Countinued. . . 



5. 


Parcxpwnx02 


Q5MBV5 


Predicted allergen by 
IgE mapping, BLAST 
and Hybrid approach 


Present 


42.50% with Act c 1 
from a.a number 257 to 

336 


> 3 to < 10 


6. 


RNA polymerase II largest 
subunit (Fragment) 


Q5EXZ8 


Predicted allergen by 
SVM-based method 


Present 


41.25% with DerflS 
from a.a number 16 to 
95 


> 3 and < 10 


7. 


Hemocyanin subunit type 
II 


B9W4N8 


Predicted allergen by 
SVM-based method 


Present 


48.75% with Per a 
3.0201 from a.a number 


> 3 and < 10 








112 to 191 




8. 


Hemocyanin subunit type 
I 


B9W4N7 


Predicted allergen by 
SVM-based method, 
BLAST and hybrid 
approach 


Present 


65.00% with Per a 
3.0201 from a.a number 
100 to 179 


<3 


9. 


Arginine kinase 


A1KY39 


Predicted allergen by 
SVM-based method 


Present 


93.75% with Plo i 1 
from a.a number 257 to 

336 


<3 



present - indicates that there are stretches of 6 contiguous amino acid identical to a known allergen 



Discussion 

In silico protein analysis is a well-established 
technique for assessment of allergenicity and 
immunological cross-reactivity (16, 17). 
We screened 208 P. americana proteins using 
AlgPred and SDAP bioinformatics tools. Highly 
significant prediction score (PD<3) were obtained 
for two arginine kinases (Accession nos. D3JUE7 
and A1KY39). Interestingly, another arginine 
kinase in P. americana (Accession no. B1A7S7) 
has been documented to be an important aller- 
gen in the Thai population (18) that correlates well 
with our highly significant scores in our study for 
this enzyme. Arginine kinase isomers have been 
reported in Caenorhabditis elegans and it has 
been suggested that tissue restricted expression of 
isoforms in this family evolved early (19). In 
addition, highly significant homology (93.75%) 
was seen between the arginine kinase of P. ameri- 
cana and that of Plodia interpunctella (Indian- 
meal moth) that also acts as a powerful allergen 
(16). It is most likely that the two arginine 
kinases reported in this study are allergens. 
The protein, peptidyl-prolyl cis-trans isomerase 
(accession no. Q9U8K2) also showed significant 
prediction score (PD<3). Peptidyl-prolyl cis-trans 
isomerase belongs to the cyclophiUn-type PPIase 
family and shows significant homology (77.5%) 



with a cyclophilin allergen from the yeast Ma- 
lassezia sympodialis (Mala s 6) (20). Cyclophil- 
ins constitute a family of proteins involved in 
many important cellular functions. They have also 
been identified as a pan-allergen family able to elicit 
IgE-mediated hypersensitivity reactions (21, 22). 
Homologue of Sarcophaga 26 and 29 kDa pro- 
teinase (accession no. Q9U914) and parcxpwnx02 
(accession no. Q5MBV5) were predicted as po- 
tential allergens due to their peptidase property. 
Both these proteins showed significant identity, 
50% and 42.5% respectively, with allergen Act c 
1 (or actinidin), a cysteine protease and also a 
major allergen in kiwi fruit. This 30 kDa acidic 
protein is present in kiwi fruit in several iso- 
forms that differ in the PI value (23). Homo- 
logue of Sarcophaga 26 and 29kDa proteinase 
appears to eliminate foreign proteins in this in- 
sect and is conserved in a wide variety of insects 
and participates in their defense mechanism (24). 
German cockroach proteases have been known 
to play a role as allergens, participating in cleavage 
of matrix metalloproteinase (MMP-9) thereby re- 
modeling airway passage (25). 
Hemocyanin subunit type I and II proteins show 
48.75 and 65% identity respectively, with the 
known cockroach allergen of Per a 3-family. Per 
a 3 belongs to the most potent allergens (26). 
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This result also becomes noteworthy if we con- 
sider that certain other proteins with hemocya- 
nin domains are allergens (27, 28). 
Another protein RNA polymerase II (accession 
no. Q5EXZ8) has shown 41.25% identity with a 
Der f 15 that is a major canine high molecular 
weight allergen (29). Alpha-amylase (fragment) 
(accession no. D2YVM9) was also found to be a 
potential allergen. Fungal a-amylase is a known 
dust allergen that is commonly found in bakery, 
in particularly wheat or flour, products (30). 
There is a possibility that the a-amylase protein 
of P. americana is also a dust allergen associ- 
ated with the cockroach species. Cockroaches 
are found in flour and it is highly possible that 
the dust allergen, a-amylase, is transferred from 
the flour to the cockroaches. 
In conclusion, in silico studies are valuable tools 
for predicting potential proteins should be given 
priority in allergen research. We have identified 
9 proteins of the P. americana that are potential 
allergens and warrant further studies in this area. 
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